In [1]:
library(Seurat)
library(ggplot2)
library(dplyr)
options(repr.plot.width=10, repr.plot.height=8)
library(harmony)
library(clusterProfiler)
library(enrichplot)
library(msigdbr)
library(stringr)
set.seed(2025)
Loading required package: SeuratObject
Loading required package: sp
‘SeuratObject’ was built under R 4.4.1 but the current version is
4.4.2; it is recomended that you reinstall ‘SeuratObject’ as the ABI
for R may have changed
‘SeuratObject’ was built with package ‘Matrix’ 1.6.5 but the current
version is 1.7.3; it is recomended that you reinstall ‘SeuratObject’ as
the ABI for ‘Matrix’ may have changed
Attaching package: ‘SeuratObject’
The following objects are masked from ‘package:base’:
intersect, t
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Loading required package: Rcpp
clusterProfiler v4.14.6 Learn more at https://yulab-smu.top/contribution-knowledge-mining/
Please cite:
Guangchuang Yu, Li-Gen Wang, Yanyan Han and Qing-Yu He.
clusterProfiler: an R package for comparing biological themes among
gene clusters. OMICS: A Journal of Integrative Biology. 2012,
16(5):284-287
Attaching package: ‘clusterProfiler’
The following object is masked from ‘package:stats’:
filter
enrichplot v1.26.6 Learn more at https://yulab-smu.top/contribution-knowledge-mining/
Please cite:
Guangchuang Yu. Gene Ontology Semantic Similarity Analysis Using
GOSemSim. In: Kidder B. (eds) Stem Cell Transcriptional Networks.
Methods in Molecular Biology. 2020, 2117:207-215. Humana, New York, NY.
For full functionality, please install the 'msigdbdf' package with:
install.packages('msigdbdf', repos = 'https://igordot.r-universe.dev')
In [2]:
in_path <- "~/projects/2024/RA/GSE163121_RAW/"
out_path <- "~/projects/2024/RA/GSE163121_RAW/out/"
In [3]:
d_name <- c('GSM4972213', 'GSM4972214', 'GSM4972215', 'GSM4972216', 'GSM4972217')
In [4]:
scdata <- list()
for (i in 1:length(d_name)) {
data <- Read10X(data.dir = paste0(in_path,d_name[i]))
scdata[[i]] <- CreateSeuratObject(counts = data, project = d_name[i])
}
scdata
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
Warning message:
“Feature names cannot have underscores ('_'), replacing with dashes ('-')”
[[1]] An object of class Seurat 33694 features across 5514 samples within 1 assay Active assay: RNA (33694 features, 0 variable features) 1 layer present: counts [[2]] An object of class Seurat 33694 features across 4121 samples within 1 assay Active assay: RNA (33694 features, 0 variable features) 1 layer present: counts [[3]] An object of class Seurat 33694 features across 3516 samples within 1 assay Active assay: RNA (33694 features, 0 variable features) 1 layer present: counts [[4]] An object of class Seurat 33694 features across 7351 samples within 1 assay Active assay: RNA (33694 features, 0 variable features) 1 layer present: counts [[5]] An object of class Seurat 33694 features across 4535 samples within 1 assay Active assay: RNA (33694 features, 0 variable features) 1 layer present: counts
In [5]:
GSE163121 <- merge(scdata[[1]], y = scdata[-1], add.cell.ids = d_name)
In [6]:
GSE163121
An object of class Seurat 33694 features across 25037 samples within 1 assay Active assay: RNA (33694 features, 0 variable features) 5 layers present: counts.GSM4972213, counts.GSM4972214, counts.GSM4972215, counts.GSM4972216, counts.GSM4972217
In [7]:
GSE163121[["percent.mt"]] <- PercentageFeatureSet(GSE163121, pattern = "^MT-")
In [8]:
GSE163121
An object of class Seurat 33694 features across 25037 samples within 1 assay Active assay: RNA (33694 features, 0 variable features) 5 layers present: counts.GSM4972213, counts.GSM4972214, counts.GSM4972215, counts.GSM4972216, counts.GSM4972217
In [9]:
VlnPlot(GSE163121, features = c("nFeature_RNA"), pt.size = 0)
VlnPlot(GSE163121, features = c("nCount_RNA"), pt.size = 0)
VlnPlot(GSE163121, features = c("percent.mt"), pt.size = 0)
Warning message: “Default search for "data" layer in "RNA" assay yielded no results; utilizing "counts" layer instead.” Warning message: “Default search for "data" layer in "RNA" assay yielded no results; utilizing "counts" layer instead.”
Warning message: “Default search for "data" layer in "RNA" assay yielded no results; utilizing "counts" layer instead.”
In [10]:
p1 <- FeatureScatter(GSE163121, feature1 = "nCount_RNA", feature2 = "percent.mt", raster = F)
p2 <- FeatureScatter(GSE163121, feature1 = "nCount_RNA", feature2 = "nFeature_RNA", raster = F)
p1
p2
In [11]:
GSE163121 <- subset(GSE163121, subset = nFeature_RNA > 200 & nFeature_RNA < 5000 & percent.mt <20)
GSE163121 <- JoinLayers(GSE163121)
GSE163121
An object of class Seurat 33694 features across 24992 samples within 1 assay Active assay: RNA (33694 features, 0 variable features) 1 layer present: counts
In [12]:
GSE163121 <- NormalizeData(GSE163121, normalization.method = "LogNormalize", scale.factor = 10000)
GSE163121 <- FindVariableFeatures(GSE163121, selection.method = "vst", nfatures = 2000)
GSE163121 <- ScaleData(GSE163121, features = rownames(GSE163121))
GSE163121 <- RunPCA(GSE163121)
Normalizing layer: counts Finding variable features for layer counts Centering and scaling data matrix PC_ 1 Positive: TYROBP, LYZ, FCER1G, S100A8, S100A9, CSTA, FCN1, S100A12, THBS1, C15orf48 G0S2, CSF3R, MAFB, CLEC7A, IL1B, TREM1, S100A11, SLC11A1, CEBPD, CYP1B1 IER3, SERPINA1, SEMA6B, C5AR1, PTGES, CTSD, CDA, TNFAIP2, SERPINB2, CFD Negative: EEF1A1, CD74, RPS18, RPLP0, RPLP1, IGHM, MT-CYB, FCER2, TCL1A, ISG20 NEIL1, JUN, IGKC, MYC, HIST1H4C, MALAT1, KIAA0125, IL6, IGLC3, STAG3 INSIG2, IGKV3-20, ZNF595, POU2AF1, IGHV5-78, IGHV3-30, IRF4, RP5-887A10.1, IGKV1-12, IGKV1-5 PC_ 2 Positive: FTH1, THBS1, S100A12, LYZ, CSTA, C15orf48, EEF1A1, S100A8, PTGES, TREM1 S100A9, SEMA6B, CD74, CYP1B1, G0S2, CSF3R, IL1B, FCN1, SLC11A1, SERPINB2 MAFB, CDA, SAT1, CLEC7A, IFNGR2, RPLP1, FTL, ANPEP, DMXL2, FCAR Negative: GZMA, GZMB, CTSW, PRF1, GZMM, CD7, CLIC3, NKG7, CD247, FGFBP2 IL2RB, SPON2, KLRF1, IGFBP7, ADGRG1, KLRB1, S1PR5, ZAP70, GZMH, IL32 SH2D1B, IFITM1, TTC38, C1orf21, SPN, PRKCH, SAMD3, TRBC1, CCL5, MATK PC_ 3 Positive: MALAT1, FTH1, CD7, GZMA, GZMB, CD247, NKG7, IL32, GZMM, PRF1 EEF1A1, CTSW, KLRB1, FGFBP2, IL2RB, SPON2, CLIC3, PRKCH, GZMH, ZAP70 KLRF1, ADGRG1, S1PR5, CD3E, CCL5, NEAT1, MATK, FCER1G, TRBC1, IGFBP7 Negative: TXNDC5, TNFRSF17, MZB1, DERL3, AQP3, HRASLS2, CHPF, POU2AF1, ITM2C, CD27 IGLL5, CPNE5, DCPS, TMSB10, MYDGF, IGKC, PEBP1, PPIB, ISOC2, RP11-1070N10.3 SSR3, LSP1, SDF2L1, EAF2, FKBP11, PYCR1, CD74, GPRC5D, PDIA5, MT-CYB PC_ 4 Positive: CD3D, CD3E, CAMK4, CD6, CD2, MAL, NELL2, CD3G, AQP3, LEPROTL1 RPLP0, TRAT1, RGS10, TNFRSF25, RORA, EEF1A1, BCL11B, RPLP1, IL32, LCK DHRS3, CD28, NINJ1, GATA3, TRAC, FKBP11, CD27, LIME1, RPS26, CD96 Negative: CD74, MT-CO1, MT-CYB, IGHM, FGR, IFITM3, FCER2, TMSB10, IFI44L, LY6E RHOB, RGS2, IFITM2, CTSS, LSP1, TCL1A, ITGB2, SPON2, IGFBP7, C1orf162 XAF1, GPR18, GZMB, PRF1, GZMA, IFI6, CLIC3, KLRF1, PLD4, CH17-373J23.1 PC_ 5 Positive: HES4, NEAT1, MALAT1, LGALS1, SAT1, ZBTB32, RPS26, GPR137B, DUSP4, LMNA RHOB, SRGN, TNFRSF1B, BLVRB, LGMN, TUBB6, ENC1, SAMSN1, ATF3, AC104024.1 TMSB4X, CLECL1, CD86, RGCC, FTH1, SEC61B, KPNA2, SEMA7A, BCL2A1, RGS2 Negative: MT-CYB, TCL1A, FCER2, TMSB10, IGHM, C1orf162, CD3E, CD3D, JUN, LCK MYC, PIM1, STAG3, IGLL5, MAL, CD6, CDC25B, CD3G, CYBB, NOSIP IGHV5-78, RP11-164H13.1, EEF1A1, IL32, MID1IP1, FCGRT, PDE4D, CAMK4, APLP2, CD2
In [13]:
ElbowPlot(GSE163121)
In [14]:
DimPlot(GSE163121)
In [15]:
GSE163121 <- RunUMAP(GSE163121, dims = 1:9, reduction = "pca", reduction.name = "umap.unintegrated")
Warning message: “The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation' This message will be shown once per session” 12:15:35 UMAP embedding parameters a = 0.9922 b = 1.112 12:15:35 Read 24992 rows and found 9 numeric columns 12:15:35 Using Annoy for neighbor search, n_neighbors = 30 12:15:35 Building Annoy index with metric = cosine, n_trees = 50 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * | 12:15:38 Writing NN index file to temp file /tmp/Rtmpv61tWw/file32c7911b450abd 12:15:38 Searching Annoy index using 1 thread, search_k = 3000 12:15:48 Annoy recall = 100% 12:15:49 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30 12:15:51 Initializing from normalized Laplacian + noise (using RSpectra) 12:15:52 Commencing optimization for 200 epochs, with 1022006 positive edges 12:15:52 Using rng type: pcg 12:16:07 Optimization finished
In [16]:
DimPlot(GSE163121)
In [17]:
Idents(GSE163121, cells = colnames(GSE163121)[is.element(GSE163121$orig.ident,c("GSM4972213","GSM4972214"))]) <- "HC"
Idents(GSE163121, cells = colnames(GSE163121)[is.element(GSE163121$orig.ident,c("GSM4972215","GSM4972216", "GSM4972217"))]) <- "SLE"
GSE163121$conditions <- Idents(GSE163121)
DimPlot(GSE163121, raster = F, group.by = "conditions")
In [18]:
GSE163121 <- RunHarmony(GSE163121,"orig.ident", plot_convergence = T, theta = 3)
Transposing data matrix Initializing state using k-means centroids initialization Harmony 1/10 Harmony 2/10 Harmony 3/10 Harmony converged after 3 iterations
In [19]:
harmony_embeddings <- Embeddings(GSE163121, "harmony")
harmony_embeddings[1:5, 1:5]
| harmony_1 | harmony_2 | harmony_3 | harmony_4 | harmony_5 | |
|---|---|---|---|---|---|
| GSM4972213_AAACCTGCAATGTTGC-1 | -2.5659808 | 1.4727543 | 0.7082829 | -0.4493815 | -1.9514298 |
| GSM4972213_AAACCTGGTCCGCTGA-1 | -2.6829934 | 0.9718616 | -0.2444526 | 0.2220524 | -0.8091164 |
| GSM4972213_AAACCTGGTCTAGGTT-1 | -0.5973693 | 0.9813738 | -1.2604122 | -0.1999167 | 0.3847095 |
| GSM4972213_AAACCTGGTCTGATCA-1 | -1.0194418 | 1.2523079 | 0.3056075 | -0.8770639 | -0.5351299 |
| GSM4972213_AAACCTGTCAACACAC-1 | -2.0482378 | 1.4868648 | 0.8508176 | -0.4472641 | -1.1899475 |
In [20]:
DimPlot(GSE163121, reduction = "harmony", group.by = "orig.ident",raster = F)
In [21]:
GSE163121 <- RunUMAP(GSE163121, dims = 1:9, reduction = "harmony")
12:16:38 UMAP embedding parameters a = 0.9922 b = 1.112 12:16:38 Read 24992 rows and found 9 numeric columns 12:16:38 Using Annoy for neighbor search, n_neighbors = 30 12:16:38 Building Annoy index with metric = cosine, n_trees = 50 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * | 12:16:42 Writing NN index file to temp file /tmp/Rtmpv61tWw/file32c79188f6fee 12:16:42 Searching Annoy index using 1 thread, search_k = 3000 12:16:52 Annoy recall = 100% 12:16:53 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30 12:16:55 Initializing from normalized Laplacian + noise (using RSpectra) 12:16:56 Commencing optimization for 200 epochs, with 1014148 positive edges 12:16:56 Using rng type: pcg 12:17:14 Optimization finished
In [22]:
GSE163121 <- FindNeighbors(GSE163121, reduction = "harmony", dims = 1:9)
GSE163121 <- FindClusters(GSE163121, resolution = 0.5, cluster.name = "harmony_clusters")
Computing nearest neighbor graph Computing SNN
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck Number of nodes: 24992 Number of edges: 708266 Running Louvain algorithm... Maximum modularity in 10 random starts: 0.8399 Number of communities: 10 Elapsed time: 6 seconds
In [23]:
p1<- DimPlot(GSE163121, raster = F, reduction = "umap")
p1
p2<- DimPlot(GSE163121, raster = F, reduction = "umap", group.by = "orig.ident")
p2
p3 <- DimPlot(GSE163121, raster = F, reduction = "umap", group.by = "conditions")
p3
pdf(paste0(out_path, "UMAP_harmony.pdf"), width=8, height=6)
p1
p2
p3
dev.off()
pdf: 2
In [24]:
ABC_marker_1 <- c("CD19", "CD80", "CD86", "FAS", "FCRLA", "FCRLB", "FGL2", "CXCL10", "CXCR3", "NKG7", "ITGAM", "ITGB2", "TBX21", "ZBTB32", "FCER2", "CR2")
length(ABC_marker_1)
is.element(ABC_marker_1,rownames(GSE163121))
ABC_marker_1 <- ABC_marker_1[is.element(ABC_marker_1,rownames(GSE163121))]
16
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
- TRUE
In [25]:
fig1_1 <- FeaturePlot(GSE163121, features = ABC_marker_1[1:9], ncol = 3, reduction = "umap")
fig1_2 <- FeaturePlot(GSE163121, features = ABC_marker_1[10:16], ncol = 3, reduction = "umap")
pdf(paste0(out_path, "ABC_marker_feature.pdf"), width=12, height=10)
fig1_1
fig1_2
dev.off()
pdf: 2
In [26]:
source("~/projects/MK_Rcode/setZ.R")
GSE163121[["ABC_score"]] <- setZ(ABC_marker_1,rownames(GSE163121@assays$RNA$scale.data),GSE163121@assays$RNA$scale.data)
In [27]:
LSP1_gene <- c("CEBPA", "CEBPB", "IL1B", "CCL3", "TNF", "CCL4", "CCL5", "SPP1", "CD7", "MSR1", "ITGAM", "KIT", "CD14", "CSF3R", "IL1R2", "CSF1R", "ITGA1", "ITGA5", "ITGB5", "ITGB1", "CAMP", "THBS1", "C3", "MEFV", "FOS", "CYBB", "CARD9", "IRF7", "OAS3", "NLRP3", "CTSB", "IKBKE", "MPO")
length(LSP1_gene)
LSP1_gene_sub <- LSP1_gene[is.element(LSP1_gene,rownames(GSE163121))]
33
In [28]:
LSP1_gene_sub
- 'CEBPA'
- 'CEBPB'
- 'IL1B'
- 'CCL3'
- 'TNF'
- 'CCL4'
- 'CCL5'
- 'SPP1'
- 'CD7'
- 'MSR1'
- 'ITGAM'
- 'KIT'
- 'CD14'
- 'CSF3R'
- 'IL1R2'
- 'CSF1R'
- 'ITGA1'
- 'ITGA5'
- 'ITGB5'
- 'ITGB1'
- 'CAMP'
- 'THBS1'
- 'C3'
- 'MEFV'
- 'FOS'
- 'CYBB'
- 'CARD9'
- 'IRF7'
- 'OAS3'
- 'NLRP3'
- 'CTSB'
- 'IKBKE'
- 'MPO'
In [29]:
fig2_1 <- FeaturePlot(GSE163121, features = LSP1_gene_sub[1:9], ncol = 3, reduction = "umap")
fig2_2 <- FeaturePlot(GSE163121, features = LSP1_gene_sub[10:18], ncol = 3, reduction = "umap")
fig2_3 <- FeaturePlot(GSE163121, features = LSP1_gene_sub[19:27], ncol = 3, reduction = "umap")
fig2_4 <- FeaturePlot(GSE163121, features = LSP1_gene_sub[28:33], ncol = 3, reduction = "umap")
pdf(paste0(out_path, "LSP1_marker_feature.pdf"), width=12, height=10)
fig2_1
fig2_2
fig2_3
fig2_4
dev.off()
pdf: 2
In [30]:
GSE163121[["LSP1_score"]] <- setZ(LSP1_gene_sub,rownames(GSE163121@assays$RNA$scale.data),GSE163121@assays$RNA$scale.data)
In [43]:
FeaturePlot(GSE163121, features = "LSP1_score", reduction = "umap") +
scale_color_gradient2(low = "white", mid = "white", high = "blue", midpoint = 0)
FeaturePlot(GSE163121, features = "LSP1_score", reduction = "umap")
GSE163121$LSP1_score_pos <- GSE163121$LSP1_score > 0
DimPlot(GSE163121, group.by = "LSP1_score_pos", reduction = "umap")
FeaturePlot(GSE163121, features = "ABC_score") +
scale_color_gradient2(low = "white", mid = "white", high = "blue", midpoint = 0)
FeaturePlot(GSE163121, features = "ABC_score", reduction = "umap")
GSE163121$ABC_score_pos <- GSE163121$ABC_score > 0
DimPlot(GSE163121, group.by = "ABC_score_pos", reduction = "umap")
pdf(paste0(out_path, "score_feature.pdf"), width=7, height=4.5)
FeaturePlot(GSE163121, features = "LSP1_score", reduction = "umap") +
scale_color_gradient2(low = "white", mid = "white", high = "blue", midpoint = 0)
FeaturePlot(GSE163121, features = "LSP1_score", reduction = "umap")
DimPlot(GSE163121, group.by = "LSP1_score_pos", reduction = "umap")
FeaturePlot(GSE163121, features = "ABC_score", reduction = "umap") +
scale_color_gradient2(low = "white", mid = "white", high = "blue", midpoint = 0)
FeaturePlot(GSE163121, features = "ABC_score", reduction = "umap")
DimPlot(GSE163121, group.by = "ABC_score_pos", reduction = "umap")
dev.off()
Scale for colour is already present. Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present. Adding another scale for colour, which will replace the existing scale.
Scale for colour is already present. Adding another scale for colour, which will replace the existing scale. Scale for colour is already present. Adding another scale for colour, which will replace the existing scale.
pdf: 2
In [32]:
B_marker <- c("CD19", "CD79A", "CD79B", "MS4A1", "CD27", "CD24", "CD38", "IGHD")
T_marker <- c("CD4", "CD8A", "CD69", "CD70", "CD80", "CD40LG", "PDCD1", "HAVCR2")
FeaturePlot(GSE163121, features = B_marker, ncol = 3, reduction = "umap")
FeaturePlot(GSE163121, features = T_marker, ncol = 3, reduction = "umap")
pdf(paste0(out_path, "marker_expression.pdf"), width=12, height=12)
FeaturePlot(GSE163121, features = B_marker, ncol = 3, reduction = "umap")
FeaturePlot(GSE163121, features = T_marker, ncol = 3, reduction = "umap")
dev.off()
pdf: 2
In [33]:
LSP1_pos_ind <- GSE163121$LSP1_score_pos==TRUE & GSE163121$ABC_score_pos==FALSE
ABC_pos_ind <- GSE163121$LSP1_score_pos==FALSE & GSE163121$ABC_score_pos==TRUE
DP_ind <- GSE163121$LSP1_score_pos==TRUE & GSE163121$ABC_score_pos==TRUE
DN_ind <- GSE163121$LSP1_score_pos==FALSE & GSE163121$ABC_score_pos==FALSE
sum(LSP1_pos_ind)
sum(ABC_pos_ind)
sum(DP_ind)
sum(DN_ind)
3542
6386
3511
11553
In [34]:
Idents(GSE163121, cells = colnames(GSE163121)[LSP1_pos_ind]) <- "LSP1_positive"
Idents(GSE163121, cells = colnames(GSE163121)[ABC_pos_ind]) <- "ABC_positive"
Idents(GSE163121, cells = colnames(GSE163121)[DP_ind]) <- "Double_positive"
Idents(GSE163121, cells = colnames(GSE163121)[DN_ind]) <- "Double_negative"
DimPlot(GSE163121, reduction = "umap")
pdf(paste0(out_path,"umap_ABC_LSP1.pdf"), width = 8, height = 6)
DimPlot(GSE163121, reduction = "umap")
dev.off()
pdf: 2
In [35]:
markers <- FindAllMarkers(GSE163121, min.pct = 0.1, test.use = 'LR', logfc.threshold = 0.25)
GSE163121
Calculating cluster Double_negative Calculating cluster Double_positive Calculating cluster ABC_positive Calculating cluster LSP1_positive
An object of class Seurat 33694 features across 24992 samples within 1 assay Active assay: RNA (33694 features, 2000 variable features) 3 layers present: counts, data, scale.data 4 dimensional reductions calculated: pca, umap.unintegrated, harmony, umap
In [36]:
markers %>%
group_by(cluster) %>%
filter(avg_log2FC > 0) %>%
filter(p_val_adj < 0.05) %>%
arrange(desc(avg_log2FC)) %>%
slice_head(n = 5) %>%
ungroup() -> top5
# DoHeatmap(pbmc, features = top5$gene) + NoLegend()
DotPlot(GSE163121, features = top5$gene[!duplicated(top5$gene)] ) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
scale_color_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0)
pdf(paste0(out_path, "Dotplot.pdf"), width = 8, height = 6)
DotPlot(GSE163121, features = top5$gene[!duplicated(top5$gene)] ) +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5)) +
scale_color_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0)
dev.off()
Warning message: “Scaling data with a low number of groups may produce misleading results” Scale for colour is already present. Adding another scale for colour, which will replace the existing scale. Warning message: “Scaling data with a low number of groups may produce misleading results” Scale for colour is already present. Adding another scale for colour, which will replace the existing scale.
pdf: 2
In [37]:
unique(Idents(GSE163121))
- Double_negative
- LSP1_positive
- ABC_positive
- Double_positive
Levels:
- 'Double_negative'
- 'Double_positive'
- 'ABC_positive'
- 'LSP1_positive'
In [38]:
LSP1vsABC.marker <- FindMarkers(GSE163121, min.pct = 0.1, test.use = 'LR', logfc.threshold = 0.1, ident.1 = "LSP1_positive", ident.2 = "ABC_positive")
source("~/projects/MK_Rcode/volcano_scdata.R")
In [39]:
vol1 <- volcano_scdata(LSP1vsABC.marker, "LSP1 vs ABC")
pdf(paste0(out_path,"LSP1_vs_ABC_volcano.pdf"), width = 6, height = 6)
vol1
dev.off()
pdf: 2
In [40]:
hallmark_geneset <- msigdbr(species = "Homo sapiens", category = "H")
hallmark_geneset$gs_name_short <- str_extract(hallmark_geneset$gs_name, "(?<=HALLMARK_).*$")
sort_glist <- LSP1vsABC.marker$avg_log2FC
names(sort_glist) <- rownames(LSP1vsABC.marker)
sort_glist <- sort(sort_glist, decreasing = T)
gsea_result.LSP1vsABS <- GSEA(geneList = sort_glist, TERM2GENE = select(hallmark_geneset, gs_name_short, gene_symbol), seed = T, pvalueCutoff = 0.99)
if (!file.exists(paste0(out_path,"gsea_GSE163121_050725.rds"))) {
saveRDS(gsea_result.LSP1vsABS, paste0(out_path,"gsea_GSE163121_050725.rds"))
}
gsea_result.LSP1vsABS <- GSEA(geneList = sort_glist, TERM2GENE = select(hallmark_geneset, gs_name_short, gene_symbol), seed = T)
dotplot(gsea_result.LSP1vsABS, showCategory=10, split=".sign") + facet_grid(.~.sign)
Warning message:
“The `category` argument of `msigdbr()` is deprecated as of msigdbr 10.0.0.
ℹ Please use the `collection` argument instead.”
The 'msigdbdf' package must be installed to access the full dataset.
Please run the following command to install the 'msigdbdf' package:
install.packages('msigdbdf', repos = 'https://igordot.r-universe.dev')
using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).
preparing geneSet collections...
GSEA analysis...
Warning message in fgseaMultilevel(pathways = pathways, stats = stats, minSize = minSize, :
“For some pathways, in reality P-values are less than 1e-10. You can set the `eps` argument to zero for better estimation.”
leading edge analysis...
done...
using 'fgsea' for GSEA analysis, please cite Korotkevich et al (2019).
preparing geneSet collections...
GSEA analysis...
Warning message in fgseaMultilevel(pathways = pathways, stats = stats, minSize = minSize, :
“For some pathways, in reality P-values are less than 1e-10. You can set the `eps` argument to zero for better estimation.”
leading edge analysis...
done...
In [41]:
pdf(paste0(out_path, "GSEA_LSP1vsABS.pdf"), width = 8, height = 6)
dotplot(gsea_result.LSP1vsABS, showCategory=10, split=".sign") + facet_grid(.~.sign)
dev.off()
pdf: 2
In [42]:
if (!file.exists(paste0(out_path,"GSE163121_metadata.rds"))) {
saveRDS(GSE163121@meta.data, file = paste0(out_path,"GSE163121_metadata.rds"))
}
In [44]:
if (!file.exists(paste0(out_path,"GSE163121_Sobj_050825.rds"))) {
saveRDS(GSE163121, paste0(out_path,"GSE163121_Sobj_050825.rds"))
}